pacman:: p_load(sf, spatstat, tmap, rvest, tidyverse)02-2: 2nd Order Spatial Point Patterns Analysis Methods
execute: echo: true #display the code eval: true message: false warning: false freeze: false # true: not render if nothing edited editor: visual —
1 Overview
This exercise uses second-order spatial point pattern analysis and spatstat package to study childcare centres in Singapore. Unlike first-order analysis, which looks only at overall density independently, second-order methods reveal relationships and influences between centres (points) based on distance.
We aim to answer two key questions:
Are childcare centres randomly distributed across Singapore?
If not, where are the areas with higher concentrations?
2 The data
As also used in 1st order SPPA, the following datasets were downloaded from publicly available websites, and both are available in KML and GeoJSON format.
| Dataset Name | Source | Discrption |
|---|---|---|
| Child Care Services | data.gov.sg | Point feature data: contains the locations and attributes of childcare centres. |
| Master plan 2019 Subzone Boundary (No Sea) | singstat | Polygon feature data: represents the URA 2019 Master Plan planning subzone boundaries. |
3 Installing and Loading the R packages
In addition to spatstat, a total of five R packages will be used in this exercise.
| Package | Discription |
|---|---|
| sf | Simple Features, a new R package which handles importing, managing, and processing vector-based geospatial data. |
| spatstat | Provides useful functions for SPPA, which will be called to conduct both 1st and 2nd SPPA and KDE. |
| tmap | Creates high quality static or interactive choropleth maps via leaflet. |
| rvest | Scrapes and extracts data from web pages. |
After installation, we load them into R environment using the code below.
4 Data Import and Preparation
4.1 importing data
The following code chunk shows the steps to first import the Master plan 2019 Subzone Boundary (No Sea) data using st_read, extract the required 4 columns from the Description field, filter out the nearby islands, and finally save the file as mpsz_cl for further analysis.
mpsz_sf <- st_read("data/MasterPlan2019SubzoneBoundaryNoSeaKML.kml") %>%
st_zm(drop = TRUE, what = "ZM") %>%
st_transform(crs = 3414)Reading layer `URA_MP19_SUBZONE_NO_SEA_PL' from data source
`C:\lsrgc\ISSS626-yiqiong-pan\Hands-on_Ex\Hands-on_Ex02\data\MasterPlan2019SubzoneBoundaryNoSeaKML.kml'
using driver `KML'
Simple feature collection with 332 features and 2 fields
Geometry type: MULTIPOLYGON
Dimension: XY
Bounding box: xmin: 103.6057 ymin: 1.158699 xmax: 104.0885 ymax: 1.470775
Geodetic CRS: WGS 84
extract_kml_field <- function(html_text, field_name) {
if (is.na(html_text) || html_text == "") return(NA_character_)
page <- read_html(html_text)
rows <- page %>% html_elements("tr")
value <- rows %>%
keep(~ html_text2(html_element(.x, "th")) == field_name) %>%
html_element("td") %>%
html_text2()
if (length(value) == 0) NA_character_ else value
}# map_chr of purr (tidyverse) applies a function to each element of a list/vector and returns a character vector.
mpsz_sf <- mpsz_sf %>%
mutate(
REGION_N = map_chr(Description, extract_kml_field, "REGION_N"),
PLN_AREA_N = map_chr(Description, extract_kml_field, "PLN_AREA_N"),
SUBZONE_N = map_chr(Description, extract_kml_field, "SUBZONE_N"),
SUBZONE_C = map_chr(Description, extract_kml_field, "SUBZONE_C")
) %>%
select(-Name, -Description) %>%
relocate(geometry, .after = last_col())mpsz_cl <- mpsz_sf %>%
filter(SUBZONE_N != "SOUTHERN GROUP",
PLN_AREA_N != "WESTERN ISLANDS",
PLN_AREA_N != "NORTH-EASTERN ISLANDS")write_rds(mpsz_cl,
"data/mpsz_cl.rds")The code chuck below imports downloaded ChildCareServices data to R as sf data frame as childcare_sf by using st_read, coverts 3d to 2d (st_zm) and finally transform the CRS from WGS84 to SVY21.
childcare_sf <- st_read("data/ChildCareServices.geojson") %>%
st_zm(drop = TRUE, what = "ZM") %>% # Drop Z and M to convert from multi-dimensional to 2d (XY)
st_transform(crs = 3414)Reading layer `ChildCareServices' from data source
`C:\lsrgc\ISSS626-yiqiong-pan\Hands-on_Ex\Hands-on_Ex02\data\ChildCareServices.geojson'
using driver `GeoJSON'
Simple feature collection with 1925 features and 2 fields
Geometry type: POINT
Dimension: XYZ
Bounding box: xmin: 103.6878 ymin: 1.247759 xmax: 103.9897 ymax: 1.462134
z_range: zmin: 0 zmax: 0
Geodetic CRS: WGS 84
Using the tmap mapping methods, the code chunk below creates a map combining childcare_sf and mpsz_cl.
tm_shape(mpsz_cl)+
tm_polygons() +
tm_shape(childcare_sf) +
tm_dots(size = 0.3)
Alternatively, an interactive thematic map can be plotted using the code below. The interactive map is easy to navigate and query intuitively. It is optional to change the background map layer(choices: ESRI.WorldGrayCanvas(default), OpenStreetMap, ESRI.WorldTopoMap).
tmap_mode('view')
tm_shape(childcare_sf) +
tm_dots() #creates a layer of dots to visualise point data on a map.tmap_mode('plot') #switch back static mapsIt is advised to always switch back to plot mode to save connection consumption and limit the number of interactive maps to 10 in one documents when publishing.
4.2 Geospatial Data Wrangling
In this section, the data (sf objects) will be converted to spatstat data structure: ppp (for point data) and owin for observation window.
Here we use as.ppp() of spatstat package to covert the point data childcare_sf to ppp file, confirm the change using class() and have a quick overview of the data statistics via summary().
childcare_ppp <- as.ppp(childcare_sf)
class(childcare_ppp)[1] "ppp"
summary(childcare_ppp)Marked planar point pattern: 1925 points
Average intensity 2.417323e-06 points per square unit
Coordinates are given to 11 decimal places
Mark variables: Name, Description
Summary:
Name Description
Length:1925 Length:1925
Class :character Class :character
Mode :character Mode :character
Window: rectangle = [11810.03, 45404.24] x [25596.33, 49300.88] units
(33590 x 23700 units)
Window area = 796335000 square units
plot(unmark(childcare_ppp), main = "childcare_ppp") #drops the marks, since simple plot() is not displaying properly showing broken tags <th>, <td>, <table> etc
Before moving forwards, let’s check if there are any duplicated points.
any(duplicated(childcare_ppp))[1] FALSE
Similarly, the owin object can be created using the function as.owin() for polygon data. After the conversion, the class() and plot() functions can be used to verify that the object is of the correct class and that the data retains its original shape.
sg_owin <- as.owin(mpsz_cl)
class(sg_owin)[1] "owin"
plot(sg_owin)
The code chunk below combines ppp and owin into one ppp file which means it updates the window of childcare_ppp to sg_owin and keeps the points that fall inside.
childcareSG_PPP = childcare_ppp[sg_owin]
class(childcareSG_PPP)[1] "ppp"
childcareSG_PPPMarked planar point pattern: 1925 points
Mark variables: Name, Description
window: polygonal boundary
enclosing rectangle: [2667.54, 55941.94] x [21448.47, 50256.33] units
summary(childcareSG_PPP)Marked planar point pattern: 1925 points
Average intensity 2.875208e-06 points per square unit
Coordinates are given to 11 decimal places
Mark variables: Name, Description
Summary:
Name Description
Length:1925 Length:1925
Class :character Class :character
Mode :character Mode :character
Window: polygonal boundary
41 separate polygons (26 holes)
vertices area relative.area
polygon 1 285 1.61128e+06 2.41e-03
polygon 2 27 1.50315e+04 2.25e-05
polygon 3 (hole) 41 -4.01660e+04 -6.00e-05
polygon 4 (hole) 317 -5.11280e+04 -7.64e-05
polygon 5 (hole) 3 -4.14100e-04 -6.19e-13
polygon 6 30 2.80002e+04 4.18e-05
polygon 7 (hole) 4 -2.86396e-01 -4.28e-10
polygon 8 (hole) 3 -1.81439e-04 -2.71e-13
polygon 9 (hole) 3 -5.99531e-04 -8.95e-13
polygon 10 (hole) 3 -3.04560e-04 -4.55e-13
polygon 11 (hole) 3 -4.46108e-04 -6.66e-13
polygon 12 (hole) 5 -2.44408e-04 -3.65e-13
polygon 13 (hole) 5 -3.64686e-02 -5.45e-11
polygon 14 71 8.18750e+03 1.22e-05
polygon 15 (hole) 38 -7.79904e+03 -1.16e-05
polygon 16 91 1.49663e+04 2.24e-05
polygon 17 (hole) 395 -7.38124e+03 -1.10e-05
polygon 18 40 1.38607e+04 2.07e-05
polygon 19 (hole) 11 -8.36705e+01 -1.25e-07
polygon 20 (hole) 3 -2.33435e-03 -3.49e-12
polygon 21 45 2.51218e+03 3.75e-06
polygon 22 139 3.22293e+03 4.81e-06
polygon 23 148 3.10395e+03 4.64e-06
polygon 24 (hole) 4 -1.72650e-04 -2.58e-13
polygon 25 75 1.73526e+04 2.59e-05
polygon 26 83 5.28920e+03 7.90e-06
polygon 27 106 3.04104e+03 4.54e-06
polygon 28 71 5.63061e+03 8.41e-06
polygon 29 10 1.99717e+02 2.98e-07
polygon 30 (hole) 3 -1.37223e-02 -2.05e-11
polygon 31 (hole) 3 -8.68789e-04 -1.30e-12
polygon 32 (hole) 3 -3.39815e-04 -5.08e-13
polygon 33 (hole) 3 -4.52041e-05 -6.75e-14
polygon 34 (hole) 3 -3.90173e-05 -5.83e-14
polygon 35 (hole) 3 -9.59845e-05 -1.43e-13
polygon 36 (hole) 8 -4.28707e-01 -6.40e-10
polygon 37 (hole) 4 -2.18619e-04 -3.27e-13
polygon 38 (hole) 6 -8.37554e-01 -1.25e-09
polygon 39 (hole) 5 -2.92235e-04 -4.36e-13
polygon 40 14053 6.67892e+08 9.98e-01
polygon 41 (hole) 3 -7.43616e-06 -1.11e-14
enclosing rectangle: [2667.54, 55941.94] x [21448.47, 50256.33] units
(53270 x 28810 units)
Window area = 669517000 square units
Fraction of frame area: 0.436
plot(unmark(childcareSG_PPP), main = "childcare_SG_PPP")
4.3 Extracting the Study Area
We focus on the childcare centres in the four areas: Punggol, Tampines, Choa Chu Kang and Jurong West.
The code chunk uses filter() to create a new variable for each area and plot() for quick preview.
pg <- mpsz_cl %>%
filter(PLN_AREA_N == "PUNGGOL")
tm <- mpsz_cl %>%
filter(PLN_AREA_N == "TAMPINES")
ck <- mpsz_cl %>%
filter(PLN_AREA_N == "CHOA CHU KANG")
jw <- mpsz_cl %>%
filter(PLN_AREA_N == "JURONG WEST")par(mfrow=c(2,2))
plot(st_geometry(pg), main = "Ponggol")
plot(st_geometry(tm), main = "Tampines")
plot(st_geometry(ck), main = "Choa Chu Kang")
plot(st_geometry(jw), main = "Jurong West")
pg_owin = as.owin(pg)
tm_owin = as.owin(tm)
ck_owin = as.owin(ck)
jw_owin = as.owin(jw)The code chunk below subsets the dataset to the study areas, rescales the unit of measurement from metre to kilometre and finally plot the areas with childcare points.
childcare_pg_ppp = childcare_ppp[pg_owin] #crop childcare points to Punggol
childcare_tm_ppp = childcare_ppp[tm_owin]
childcare_ck_ppp = childcare_ppp[ck_owin]
childcare_jw_ppp = childcare_ppp[jw_owin]par(mfrow=c(2,2))
plot(unmark(childcare_pg_ppp),
main = "Punggol")
plot(unmark(childcare_tm_ppp),
main = "Tampines")
plot(unmark(childcare_ck_ppp),
main = "Choa Chu Kang")
plot(unmark(childcare_jw_ppp),
main = "Jurong West")
5 Second-order Spatial Point Patterns Analysis
Let us examine the relationships between points at subzone level.
6 Analysing Spatial Point Process Using G-Function
The G-function Gest() shows distance from an event to its nearest event, with Monte Carlo simulation envelop() used to test against complete spatial randomness.
First we set a fixed random seed to ensure reproducibility.
set.seed(1234)- The code chunk below calculates the G-function. The plot shows childcare centres are clustered within 300 metres but rather random beyond the scale.
G_CK = Gest(childcare_ck_ppp, correction = "border") #correction is border
G_CKFunction value object (class 'fv')
for the function r -> G(r)
......................................................
Math.label Description
r r distance argument r
theo G[pois](r) theoretical Poisson G(r)
rs hat(G)[bord](r) border corrected estimate of G(r)
......................................................
Default plot formula: .~r
where "." stands for 'rs', 'theo'
Recommended range of argument r: [0, 228.98]
Available range of argument r: [0, 550.4]
plot(G_CK,xlim= c(0,500)) #xlim zooms the x-axis, CDF rises above csr before 300 and crosses after 300
- the code chunk below test Complete Spatial Randomness.
H0 = The distribution of childcare services at Choa Chu Kang are randomly distributed.
H1 = The distribution of childcare services at Choa Chu Kang are not randomly distributed.
The null hypothesis will be rejected is p-value is smaller than alpha value of 0.001.
Since the number of simulations is 999, for two-sided test, nrank = (nsim + 1) * 0.002 /2
nrank is 1 at default. This is also applicable to the following tests.
G_CK.csr <- envelope(childcare_ck_ppp, Gest, nsim= 999)Generating 999 simulations of CSR ...
1, 2, 3, ......10.........20.........30.........40.........50.........60..
.......70.........80.........90.........100.........110.........120.........130
.........140.........150.........160.........170.........180.........190........
.200.........210.........220.........230.........240.........250.........260......
...270.........280.........290.........300.........310.........320.........330....
.....340.........350.........360.........370.........380.........390.........400..
.......410.........420.........430.........440.........450.........460.........470
.........480.........490.........500.........510.........520.........530........
.540.........550.........560.........570.........580.........590.........600......
...610.........620.........630.........640.........650.........660.........670....
.....680.........690.........700.........710.........720.........730.........740..
.......750.........760.........770.........780.........790.........800.........810
.........820.........830.........840.........850.........860.........870........
.880.........890.........900.........910.........920.........930.........940......
...950.........960.........970.........980.........990........
999.
Done.
G_CK.csrPointwise critical envelopes for G(r)
and observed value for 'childcare_ck_ppp'
Edge correction: "km"
Obtained from 999 simulations of CSR
Alternative: two.sided
Significance level of pointwise Monte Carlo test: 2/1000 = 0.002
.....................................................................
Math.label Description
r r distance argument r
obs hat(G)[obs](r) observed value of G(r) for data pattern
theo G[theo](r) theoretical value of G(r) for CSR
lo hat(G)[lo](r) lower pointwise envelope of G(r) from simulations
hi hat(G)[hi](r) upper pointwise envelope of G(r) from simulations
.....................................................................
Default plot formula: .~r
where "." stands for 'obs', 'theo', 'hi', 'lo'
Columns 'lo' and 'hi' will be plotted as shading (by default)
Recommended range of argument r: [0, 220.38]
Available range of argument r: [0, 550.4]
plot(G_CK.csr, xlim= c(0,500))
The plot of Monte Carlo simulations (999 CSR sims, KM correction, 99.8% confidence) show the observed G in Choa Chu Kang is above the band of 220m, which implies clustering at short distances.
- The code chunk below calculates the G-function.
G_TM = Gest(childcare_tm_ppp, correction = "best") #correction is selected by package
G_TMFunction value object (class 'fv')
for the function r -> G(r)
...................................................................
Math.label Description
r r distance argument r
theo G[pois](r) theoretical Poisson G(r)
km hat(G)[km](r) Kaplan-Meier estimate of G(r)
hazard hat(h)[km](r) Kaplan-Meier estimate of hazard function h(r)
theohaz h[pois](r) theoretical Poisson hazard function h(r)
...................................................................
Default plot formula: .~r
where "." stands for 'km', 'theo'
Recommended range of argument r: [0, 258.51]
Available range of argument r: [0, 807.07]
plot(G_TM,xlim = c(0,800)) #xlim changes the scale
- the code chunk below test Complete Spatial Randomness.
Ho = The distribution of childcare services at Tampines are randomly distributed.
H1 = The distribution of childcare services at Tampines are not randomly distributed.
The null hypothesis will be rejected is p-value is smaller than alpha value of 0.001.
Since the number of simulations is 999, for two-sided test, nrank = (nsim + 1) * 0.002 /2
nrank is 1 at default. This is also applicable to the following tests. *
G_TM.csr <- envelope(childcare_tm_ppp, Gest, Correction = "all", nsim = 999)Generating 999 simulations of CSR ...
1, 2, 3, ......10.........20.........30.........40.........50.........60..
.......70.........80.........90.........100.........110.........120.........130
.........140.........150.........160.........170.........180.........190........
.200.........210.........220.........230.........240.........250.........260......
...270.........280.........290.........300.........310.........320.........330....
.....340.........350.........360.........370.........380.........390.........400..
.......410.........420.........430.........440.........450.........460.........470
.........480.........490.........500.........510.........520.........530........
.540.........550.........560.........570.........580.........590.........600......
...610.........620.........630.........640.........650.........660.........670....
.....680.........690.........700.........710.........720.........730.........740..
.......750.........760.........770.........780.........790.........800.........810
.........820.........830.........840.........850.........860.........870........
.880.........890.........900.........910.........920.........930.........940......
...950.........960.........970.........980.........990........
999.
Done.
G_TM.csrPointwise critical envelopes for G(r)
and observed value for 'childcare_tm_ppp'
Edge correction: "km"
Obtained from 999 simulations of CSR
Alternative: two.sided
Significance level of pointwise Monte Carlo test: 2/1000 = 0.002
.....................................................................
Math.label Description
r r distance argument r
obs hat(G)[obs](r) observed value of G(r) for data pattern
theo G[theo](r) theoretical value of G(r) for CSR
lo hat(G)[lo](r) lower pointwise envelope of G(r) from simulations
hi hat(G)[hi](r) upper pointwise envelope of G(r) from simulations
.....................................................................
Default plot formula: .~r
where "." stands for 'obs', 'theo', 'hi', 'lo'
Columns 'lo' and 'hi' will be plotted as shading (by default)
Recommended range of argument r: [0, 258.51]
Available range of argument r: [0, 807.07]
plot(G_TM.csr, xlim = c(0,800))
Similarly, the plot of Monte Carlo simulations (999 CSR sims, KM correction, 99.8% confidence) show the observed G in Tampines is above the band of 258m, which implies clustering at short distances.
7 Analysing Spatial Point Process Using F-Function
The F-function Fest() estimates the distribution of empty-space distances, which means the distance from a random location to the nearest event. It tells about the gaps. Additionally Monte Carlo envelopes are used to assess complete spatial randomness.
- Computing F-function estimation
F_CK = Fest(childcare_ck_ppp)
F_CKFunction value object (class 'fv')
for the function r -> F(r)
.....................................................................
Math.label Description
r r distance argument r
theo F[pois](r) theoretical Poisson F(r)
cs hat(F)[cs](r) Chiu-Stoyan estimate of F(r)
rs hat(F)[bord](r) border corrected estimate of F(r)
km hat(F)[km](r) Kaplan-Meier estimate of F(r)
hazard hat(h)[km](r) Kaplan-Meier estimate of hazard function h(r)
theohaz h[pois](r) theoretical Poisson hazard h(r)
.....................................................................
Default plot formula: .~r
where "." stands for 'km', 'rs', 'cs', 'theo'
Recommended range of argument r: [0, 304.33]
Available range of argument r: [0, 552.75]
plot(F_CK, xlim= c(0,500))
- Performing Complete Spatial Randomness Test
F_CK.csr <- envelope(childcare_ck_ppp, Fest, nsim = 999)Generating 999 simulations of CSR ...
1, 2, 3, ......10.........20.........30.........40.........50.........60..
.......70.........80.........90.........100.........110.........120.........130
.........140.........150.........160.........170.........180.........190........
.200.........210.........220.........230.........240.........250.........260......
...270.........280.........290.........300.........310.........320.........330....
.....340.........350.........360.........370.........380.........390.........400..
.......410.........420.........430.........440.........450.........460.........470
.........480.........490.........500.........510.........520.........530........
.540.........550.........560.........570.........580.........590.........600......
...610.........620.........630.........640.........650.........660.........670....
.....680.........690.........700.........710.........720.........730.........740..
.......750.........760.........770.........780.........790.........800.........810
.........820.........830.........840.........850.........860.........870........
.880.........890.........900.........910.........920.........930.........940......
...950.........960.........970.........980.........990........
999.
Done.
F_CK.csrPointwise critical envelopes for F(r)
and observed value for 'childcare_ck_ppp'
Edge correction: "km"
Obtained from 999 simulations of CSR
Alternative: two.sided
Significance level of pointwise Monte Carlo test: 2/1000 = 0.002
.....................................................................
Math.label Description
r r distance argument r
obs hat(F)[obs](r) observed value of F(r) for data pattern
theo F[theo](r) theoretical value of F(r) for CSR
lo hat(F)[lo](r) lower pointwise envelope of F(r) from simulations
hi hat(F)[hi](r) upper pointwise envelope of F(r) from simulations
.....................................................................
Default plot formula: .~r
where "." stands for 'obs', 'theo', 'hi', 'lo'
Columns 'lo' and 'hi' will be plotted as shading (by default)
Recommended range of argument r: [0, 304.33]
Available range of argument r: [0, 552.75]
plot(F_CK.csr, xlim=c(0,500))
Computing F-function estimation
Performing Complete Spatial Randomness Test
8 Analysing Spatial Point Process Using K-Function
8.1 Choa Chu Kang planning area
8.2 Tampines planning area
9 Analysing Spatial Point Process Using L-Function
9.1 Choa Chu Kang planning area
9.2 Tampines planning area
10 Reference
Kam, T. S. 2nd Order Spatial Point Patterns Analysis Methods. R for Geospatial Data Science and Analytics. https://r4gdsa.netlify.app/chap05